MauveDB: Statistical Modeling of Data inside Relational Databases

نویسنده

  • Amol Deshpande
چکیده

Sensor networks almost by definition present complex issues related to data quality. This is especially true for networks of mobile devices because of higher potential for data loss and errors. Moreover, the physical context, especially location, plays a very important role for mobile devices and must be considered when trying to interpret the sensor readings. The traditional approach to dealing with these issues is to incorporate a statistical model of the real-world phenomenon into the processing of sensor data; models can help in providing more robust interpretations of sensor readings, by accounting for spatial and temporal biases, by identifying faults, and by allowing estimation of missing values or future states. We illustrate this through two representative sensor data processing tasks. Re-gridding using interpolation or regression models: Sensor data, especially that generated by mobile sensor networks, is typically incomplete because of sensor node failures and communication loss. Such data must be gap-filled and possibly re-gridded into a uniform time-space dataset that would be easier to visualize and analyze (Figure 1). A variety of regression or interpolation models can be used for the purpose. Inferring hidden variables: For many deployments, the variables of interest may not be directly observable using sensors, but must be inferred from the sensed data using a model of the physical process. The inference task could simply be estimating the true position of a mobile device from noisy GPS data (using Kalman Filters), or detecting failures (using an HMM), or could be as complex as inferring the transportation mode of a user given just the noisy location information [3]. Models like dynamic Bayesian networks can be used for this purpose. Unfortunately today’s data management systems do not support such statistical modeling tasks natively, forcing the users to use external tools for this purpose. This process is time-consuming, leads to much repetition of functionality and more importantly, requires familiarity with advanced data analysis tools like Matlab or R. Our research is motivated by the question of how to make it easier for a user or an application developer to apply simple statistical models to data, preferably using declarative interfaces ? In particular, can we push this process inside a database system, so that the existing database tools like data cubes, that can be immensely useful in visualizing enormous multi-dimensional datasets, can be used analyzing sensor data ?

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exchangeable databases and their functional representation

We consider the task of statistical inference for data in the form of a relational database comprising multiple relations acting on heterogenous sets of objects. We define a notion of exchangeability for databases generalizing that of arrays, based on the idea that the objects over which the relations act are themselves exchangeable. When the data are encoded in the form of several multi-dimens...

متن کامل

Statistical Relational Learning for Document Mining

A major obstacle to fully integrated deployment of many data mining algorithms is the assumption that data sits in a single table, even though most real-world databases have complex relational structures. We propose an integrated approach to statistical modeling from relational databases. We structure the search space based on “refinement graphs”, which are widely used in inductive logic progra...

متن کامل

FactorBase: SQL for Learning A Multi-Relational Graphical Model

We describe FACTORBASE , a new SQL-based framework that leverages a relational database management system to support multi-relational model discovery. A multi-relational statistical model provides an integrated analysis of the heterogeneous and interdependent data resources in the database. We adopt the BayesStore design philosophy: statistical models are stored and managed as first-class citiz...

متن کامل

MASTER OF SCIENCE Computational Mathematics and Modern Information Technologies

Entity-Relationship Data Model: Data structuring, Entity-Relationship Diagrams, Equivalence of EntityRelationship and the Functional Modeling, Algorithms for translating Entity-Relationship Diagrams into Relational and Elementary Mathematical Data Models. Relational Data Model: The structure of the Relational Data Model, Relational Algebra, Relational Calculus, Relational Query Languages, Stati...

متن کامل

SQL for SRL: Structure Learning Inside a Database System

The position we advocate in this paper is that relational algebra can provide a unified language for both representing and computing with statistical-relational objects, much as linear algebra does for traditional single-table machine learning. Relational algebra is implemented in the Structured Query Language (SQL), which is the basis of relational database management systems. To support our p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007